The first step of this assignment is to load all of the packages we
need. Next is to read in the gapminder_clean.csv data and
assign it as gapminder this will be the original
dataset and every preceding graph will use parts of this dataset. To
avoid future errors the NA values were changed to 0.
library(tidyverse)
library(dplyr)
library(ggplot2)
library(plotly)
library(styler)
library(kableExtra)
gapminder <- (read.csv("gapminder_clean.csv")) %>%
as_tibble()
gapminder[is.na(gapminder)] <- 0This is the original dataset gapminder which shows a variety of data for different countries from 1962 to 2007. This table shows how the countries are organized into rows by year. Each country has 35 years of information collection, this example has the countries “Afghanistan” and “Albania” shown, more countries and information will be investigated throughout this markdown.
gapminder[1:15,1:5] %>%
kbl(caption = "Gapminder dataset snapshot") %>%
kable_styling(bootstrap_options = "striped", full_width = T, html_font = "Cambria")| X | Country.Name | Year | Agriculture..value.added….of.GDP. | CO2.emissions..metric.tons.per.capita. |
|---|---|---|---|---|
| 0 | Afghanistan | 1962 | 0.00000 | 0.0737813 |
| 1 | Afghanistan | 1967 | 0.00000 | 0.1237824 |
| 2 | Afghanistan | 1972 | 0.00000 | 0.1308201 |
| 3 | Afghanistan | 1977 | 0.00000 | 0.1831183 |
| 4 | Afghanistan | 1982 | 0.00000 | 0.1658791 |
| 5 | Afghanistan | 1987 | 0.00000 | 0.2755603 |
| 6 | Afghanistan | 1992 | 0.00000 | 0.1013748 |
| 7 | Afghanistan | 1997 | 0.00000 | 0.0607977 |
| 8 | Afghanistan | 2002 | 38.47194 | 0.0411292 |
| 9 | Afghanistan | 2007 | 30.62285 | 0.0878576 |
| 10 | Albania | 1962 | 0.00000 | 1.4399560 |
| 11 | Albania | 1967 | 0.00000 | 1.3637463 |
| 12 | Albania | 1972 | 0.00000 | 2.5159144 |
| 13 | Albania | 1977 | 0.00000 | 2.2758764 |
| 14 | Albania | 1982 | 31.69971 | 2.6248568 |
The first question we will be asking is what is the relationship between the CO2 Emissions (metric tons per capita) and GDP (per capita) during the year 1962 for all the countries. First we filter our data into a new variable named gapminder1962 that holds every country’s 1962 data relating to CO2 Emissions and GDP per capita.
gapminder1962data <- gapminder %>%
filter(Year == 1962) %>%
select(Country.Name, gdpPercap, Year, CO2.Emissions = CO2.emissions..metric.tons.per.capita.) This is a rough scatterplot graph for using the 1962 data we just created. It is very plain but we will add more advanced plotting features later.
ggplot(gapminder1962data, mapping = aes(x = CO2.Emissions, y= gdpPercap)) +
geom_point() +
labs(
x = "CO2 emissions (metric tons per capita)",
y = "GDP per capita",
title = "CO2 Emission Compared With GDP Per Capita in 1962"
) looking at the graph above an important question to ask is “what is the correlation between x and y?” This code will print out the correlation for the graph above.
gapminder1962data %>%
group_by(Year) %>%
summarize(cor = cor(CO2.Emissions,gdpPercap)) %>%
kbl(caption = "correlation for 1962") %>%
kable_styling(bootstrap_options = "striped",full_width = F, html_font = "Cambria", position = "left")| Year | cor |
|---|---|
| 1962 | 0.6763285 |
P-values are also every important for statistical analysis. the code below is still using the 1962 data. (The actual p-value is 5.53e-36, but the output is just 0 due to being so small)
p_value <- cor.test(gapminder1962data$CO2.Emissions,gapminder1962data$gdpPercap)
p_value$p.value %>%
kbl(caption = "p-value for 1962") %>%
kable_styling(bootstrap_options = "striped",full_width = F, html_font = "Cambria", position = "left")| x |
|---|
| 0 |
This code looks at the correlation values for all the year values in our gapminder dataset, then it chooses the top correlation value and prints out the top year.
gapminderhighcor<- gapminder %>%
select(Country.Name,Year,gdpPercap,CO2.Emissions = CO2.emissions..metric.tons.per.capita.) %>%
group_by(Year) %>%
summarize(cor = cor(CO2.Emissions, gdpPercap)) %>%
top_n(1,cor)
gapminderhighcor %>%
kbl(caption = "The year with the highest correlation value") %>%
kable_styling(bootstrap_options = "striped",full_width = F, html_font = "Cambria", position = "left")| Year | cor |
|---|---|
| 1962 | 0.6763285 |
Now we are going to use a new dataset named energyuse which is going to use the Energy use (kg of oil equivalent per capita) column. This code will separate each of the continents and then average out their energy usage from 1962 to 2007, and then print out the final value.
continents <- c("Africa","Americas","Asia","Europe","Oceania")
energyuse <- gapminder %>%
select(continent, EnergyUse = Energy.use..kg.of.oil.equivalent.per.capita.) %>%
group_by(continent ) %>%
filter(continent %in% continents) %>%
summarize(Avg_Energy_Use= mean(EnergyUse))
energyuse %>%
kbl(caption = "Average energy use per continent") %>%
kable_styling(bootstrap_options = "striped",full_width = F, html_font = "Cambria", position = "left")| continent | Avg_Energy_Use |
|---|---|
| Africa | 295.755 |
| Americas | 1334.503 |
| Asia | 1360.027 |
| Europe | 2684.640 |
| Oceania | 3980.314 |
This plotly graph will break down the energy usage as a histogram and separates each of the continents into their own smaller graph.
energygraph <- gapminder %>%
select(Country.Name,Year, continent,EnergyUse = Energy.use..kg.of.oil.equivalent.per.capita.) %>%
filter(continent %in% continents, EnergyUse != 0) %>%
ggplot(mapping = aes(EnergyUse, col = continent)) +
geom_histogram() +
labs(
x = "Energy use (kg of oil equivalent per capita)",
y = "Count",
title = "Energy Usage Per Continent from 1962 to 2007") +
facet_wrap(~continent)
ggplotly(energygraph)Now we will look at just Europe and Asia for this next dataset, we will name it comparedcontinent. This dataset includes all the countries from Europe and Asia, the Imports of goods and services (% of GDP), and only including years after 1990. The average GDP% from imports was calculated for the two continents.
FocusContinent <- c("Europe","Asia")
comparedcontinent <- gapminder %>%
select(continent, Year, ImportEcon = Imports.of.goods.and.services....of.GDP.) %>%
filter(continent %in% FocusContinent, Year > 1990)
avgcompareddata <- comparedcontinent %>%
group_by(continent) %>%
summarize(average.income = mean(ImportEcon))
avgcompareddata %>%
kbl(caption = "Comparision of Europe and Asia's GDP% from imports for years after 1990") %>%
kable_styling(bootstrap_options = "striped",full_width = F, html_font = "Cambria", position = "left")| continent | average.income |
|---|---|
| Asia | 44.14270 |
| Europe | 39.69978 |
This plotly graph is showing the growth of the GDP% from Imports of goods and services for Europe and Asia from 1992 to 2007.
comparedgraph <- ggplot(comparedcontinent, mapping = aes(x=Year, y= ImportEcon), col= continent) +
geom_smooth() +
labs(
x = "Years",
y = "GDP% from imports",
title = "GDP% Growth from Imports from 1990 to 2007" ) +
facet_wrap(~continent)
ggplotly(comparedgraph)This is a new dataset density which focuses on Population density (people per sq. km of land area).
density <- gapminder %>%
select(Country.Name,Year,Population.density = Population.density..people.per.sq..km.of.land.area.)This loop will look at the density dataset and look at every year interval and print out which country has the highest population density for that year and then loops until all the years are covered.
X <- 1962
while(X != 2012) {
innerdata <-density %>%
filter(Year == X) %>%
top_n(1,Population.density)
print(paste(innerdata[1,1], "had the record population density of people per sq km of land area of",
innerdata[1,3], "on the year", innerdata[1,2]))
X <- X + 5
} ## [1] "Monaco had the record population density of people per sq km of land area of 11521 on the year 1962"
## [1] "Monaco had the record population density of people per sq km of land area of 11648.5 on the year 1967"
## [1] "Macao SAR, China had the record population density of people per sq km of land area of 12714.1 on the year 1972"
## [1] "Monaco had the record population density of people per sq km of land area of 12904.5 on the year 1977"
## [1] "Monaco had the record population density of people per sq km of land area of 13814.5 on the year 1982"
## [1] "Macao SAR, China had the record population density of people per sq km of land area of 16132.75 on the year 1987"
## [1] "Macao SAR, China had the record population density of people per sq km of land area of 18889.95 on the year 1992"
## [1] "Macao SAR, China had the record population density of people per sq km of land area of 20601.55 on the year 1997"
## [1] "Macao SAR, China had the record population density of people per sq km of land area of 16451.037037 on the year 2002"
## [1] "Monaco had the record population density of people per sq km of land area of 17523 on the year 2007"
This plotly graph shows visually the highest population values for each year.
densitygraph <- ggplot(density, mapping = aes(x=Year,y=Population.density,col= Country.Name)) +
geom_point() +
labs(
x = "Years",
y = "Population Density (people per sq. km of land area)",
title = "Population Density of countries from 1962 to 2007"
)
ggplotly(densitygraph)This dataset lifeexp looks at the Life expectancy at birth, total (years) for the countries over the years. Then we want to see the highest increase of life expectancy from 1962 to 2007. This code’s output is the five countries with the greatest increase to life expectancy from the years 1962 to 2007.
lifeexp <- gapminder %>%
select(Country_Name = Country.Name,Year, Life.expectancy.yrs = Life.expectancy.at.birth..total..years.)
toplifeexp <- lifeexp %>%
group_by(Country_Name) %>%
summarize(Life_exectancy_increase = max(Life.expectancy.yrs)-min(Life.expectancy.yrs)) %>%
top_n(5,Life_exectancy_increase)
toplifeexp %>%
kbl(caption = "Countries with the Greatest Increase to Life Expectancy") %>%
kable_styling(bootstrap_options = "striped",full_width = F, html_font = "Cambria", position = "left")| Country_Name | Life_exectancy_increase |
|---|---|
| Bermuda | 78.93415 |
| Faroe Islands | 79.83659 |
| Liechtenstein | 81.29512 |
| San Marino | 82.50610 |
| St. Martin (French part) | 78.22195 |
This plotly graph shows the top 5 countries with the highest life expectancy increase.
toplifecontries <- c("San Marino","Faroe Islands",
"Bermuda","Liechtenstein","St. Martin (French part)")
filteredlifeexp <- lifeexp %>%
filter(Country_Name %in% toplifecontries)
lifegraph <- filteredlifeexp %>%
ggplot(mapping = aes(x=Year, y= Life.expectancy.yrs, col = Country_Name)) +
geom_smooth() +
labs(
x = "Year",
y = "Life Expectancy (yrs)",
title = "Life Expectancy Comparision for Top 5 Countries" ) +
facet_wrap(~Country_Name, nrow = 2) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1), legend.position = "none") +
scale_y_continuous(breaks = seq(0,100,20),
minor_breaks = seq(0,100,5))
ggplotly(lifegraph)